Minimal Test Cost Feature Selection with Positive Region Constraint
نویسندگان
چکیده
Test cost is often required to obtain feature values of an object. When this issue is involved, people are often interested in schemes minimizing it. In many data mining applications, due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100% accuracy. There may be an industrial standard to indicate the accuracy of the classification. In this paper, we consider such a situation and propose a new constraint satisfaction problem to address it. The constraint is expressed by the positive region; whereas the objective is to minimize the total test cost. The new problem is essentially a dual of the test cost constraint attribute reduction problem, which has been addressed recently. We propose a heuristic algorithm based on the information gain, the test cost, and a user specified parameter λ to deal with the new problem. Experimental results indicate the rational setting of λ is different among datasets, and the algorithm is especially stable when the test cost is subject to the Pareto distribution.
منابع مشابه
Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data
In many data mining and machine learning applications, data are not free, and there is a test cost for each data item. Due to economic, technological and legal reasons, it is neither possible nor necessary to obtain a classifier with 100% accuracy. In this paper, we consider such a situation and propose a new constraint satisfaction problem to address it. With this in mind, one has to minimize ...
متن کاملFeature selection with test cost constraint
Feature selection is an important preprocessing step in machine learning and data mining. In real-world applications, costs, including money, time and other resources, are required to acquire the features. In some cases, there is a test cost constraint due to limited resources. We shall deliberately select an informative and cheap feature subset for classification. This paper proposes the featu...
متن کاملA Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...
متن کاملMinimal cost feature selection of data with normal distribution measurement errors
Minimal cost feature selection is devoted to obtain a trade-off between test costs and misclassification costs. This issue has been addressed recently on nominal data. In this paper, we consider numerical data with measurement errors and study minimal cost feature selection in this model. First, we build a data model with normal distribution measurement errors. Second, the neighborhood of each ...
متن کاملFeature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm
This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the measure...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012